Loss is its own Reward: Self-Supervision for Reinforcement Learning
نویسندگان
چکیده
Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of selfsupervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquitous and instantaneous supervision for representation learning even in the absence of reward. While current results show that learning from reward alone is feasible, pure reinforcement learning methods are constrained by computational and data efficiency issues that can be remedied by auxiliary losses. Self-supervised pre-training and joint optimization improve the data efficiency and policy returns of end-to-end reinforcement learning.
منابع مشابه
Self-Supervision for Reinforcement Learning
Reinforcement learning optimizes policies for expected cumulative reward. Need the supervision be so narrow? Reward is delayed and sparse for many tasks, making it a difficult and impoverished signal for end-to-end optimization. To augment reward, we consider a range of self-supervised tasks that incorporate states, actions, and successors to provide auxiliary losses. These losses offer ubiquit...
متن کاملModelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-Compassion with the mediating role of self-efficacy for quitting
Background & Objectives: Craving is a major barrier to the effective treatment of substance addiction. This study conducted in order to Modelling structural relations of craving based on sensitivity to reinforcement, distress tolerance and self-compassion with the mediating role of self-efficacy for quitting. Materials and Methods: The method of this study was descriptive-correlational. The...
متن کاملModeling Others using Oneself in Multi-Agent Reinforcement Learning
We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players’ hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self O...
متن کاملHierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots
Abstract: Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of twowheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and it...
متن کاملAutonomous Learning of Reward Distribution for Each Agent in Multi-Agent Reinforcement Learning
A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some cases in which the agent gets more reward than that it gave to the other ones. In this case, it is better for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.07307 شماره
صفحات -
تاریخ انتشار 2016